Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parametrize zsync block size for huge files #47

Merged
merged 2 commits into from
Jul 27, 2020

Conversation

andrii-suse
Copy link
Collaborator

@andrii-suse andrii-suse commented Jun 4, 2020

Related to #22
With fix from #46 with 200G files on local environs tests:
BIG_FILE_SIZE=200G REBUILD=1 bash -x mirrorbrain/t/docker/environs/07-zsync.sh
(second run is adding following parameters in the test script )

sed -i '/dbname = mirrorbrain/a zsync_hashes = 1\nchunk_size = 33554432\nzsync_block_size_for_1G = 1048576' mb9*/mirrorbrain.conf
chunk_size    |  zsync_block_size_for_1G  |  RAM usage  |  CPU time sec
-----------------------------------------------------------------------------------------------
default (256K)|  default  (4Kb)           |   5.864g    |     35:33
32M           |  1M                       |   100M      |     32:56

@andrii-suse
Copy link
Collaborator Author

I've confirmed the same with zsyncmake as well: with using non-default block size 1M: zsyncsums for 200G file drops from almost ~500M down to ~2M, so it worth to have it configured for huge files.
It just needs to alter zsumblocksize column in hash table, because current smallint can hold max 32K

@andrii-suse andrii-suse force-pushed the configure_zsync_block_size branch from 1b03d99 to e0bf0b0 Compare June 4, 2020 20:51
@andrii-suse andrii-suse marked this pull request as ready for review June 8, 2020 08:25
@darix
Copy link
Member

darix commented Jul 6, 2020

Does this have any impact on mod_mirrorbrain or clients actually using zsync files? does that mean the smallest chunks a client then needs download in case of changes is 1MB?

@andrii-suse
Copy link
Collaborator Author

I don't think it has impact on mod_mirrorbrain.
Below if my investigation based on few hours research, so may be wrong.
Zsync algorithm pre-calculates checksums for each block, then client program needs to sync only those blocks which have different checksum.
Small (default) block size is only relevant when some fragments of the file differ, but it is probably not the case for .iso

Moreover, if my calculations are correct - 200G file with default 4K block - needs to store and send 500M checksums, which makes little sense. (And the crash in #22 happens when it tries to store those checksums into DB). Correct solution here is to use much bigger block.

@andrii-suse andrii-suse mentioned this pull request Jul 16, 2020
@andrii-suse andrii-suse force-pushed the configure_zsync_block_size branch from e0bf0b0 to 692ad2d Compare July 27, 2020 12:01
@darix darix merged commit b639fe7 into openSUSE:opensuse Jul 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants